Search CORE

680 research outputs found

LAS: a software platform to support oncological data management

Author: A El Akadi
A Tolopko
Alberto Grand
Alessandro Fiori
Andrea Bertotti
Elena Baralis
H Maier
J Wang
K Kuhn
S Clément
S Haquin
TR Golub
Publication venue: Springer Science+Business Media New York
Publication date: 01/01/2012
Field of study

The rapid technological evolution in the biomedical and molecular oncology fields is providing research laboratories with huge amounts of complex and heterogeneous data. Automated systems are needed to manage and analyze this knowledge, allowing the discovery of new information related to tumors and the improvement of medical treatments. This paper presents the Laboratory Assistant Suite (LAS), a software platform with a modular architecture designed to assist researchers throughout diverse laboratory activities. The LAS supports the management and the integration of heterogeneous biomedical data, and provides graphical tools to build complex analyses on integrated data. Furthermore, the LAS interfaces are designed to ease data collection and management even in hostile environments (e.g., in sterile conditions), so as to improve data qualit

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Structural mechanism of synergistic activation of Aurora kinase B/C by phosphorylated INCENP

Author: Abdul Azeez KR
Chatterjee S
Elkins JM
Golub TR
Sobott F
Yu C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Aurora kinases B and C (AURKB/AURKC) are activated by binding to the C-terminal domain of INCENP. Full activation requires phosphorylation of two serine residues of INCENP that are conserved through evolution, although the mechanism of this activation has not been explained. Here we present crystal structures of the fully active complex of AURKC bound to INCENP, consisting of phosphorylated, activated, AURKC and INCENP phosphorylated on its TSS motif, revealing the structural and biochemical mechanism of synergistic activation of AURKC:INCENP. The structures show that TSS motif phosphorylation stabilises the kinase activation loop of AURKC. The TSS motif phosphorylations alter the substrate-binding surface consistent with a mechanism of altered kinase substrate selectivity and stabilisation of the protein complex against unfolding. We also analyse the binding of the most specific available AURKB inhibitor, BRD-7880, and demonstrate that the well-known Aurora kinase inhibitor VX-680 disrupts binding of the phosphorylated INCENP TSS motif

Oxford University Research Archive

Institutional Repository Universiteit Antwerpen

Repositorio da Producao Cientifica e Intelectual da Unicamp

White Rose Research Online

Genomic approaches to research in lung cancer

Author: A Maitra
AA Alizadeh
JR Pollack
K Hibi
M Schena
MD Adams
MR Emmert-Buck
RL Strausberg
SA Ahrendt
SS Wang
T Wang
TR Golub
VE Velculescu
Publication venue: BioMed Central
Publication date: 01/06/2000
Field of study

The medical research community is experiencing a marked increase in the amount of information available on genomic sequences and genes expressed by humans and other organisms. This information offers great opportunities for improving our understanding of complex diseases such as lung cancer. In particular, we should expect to witness a rapid increase in the rate of discovery of genes involved in lung cancer pathogenesis and we should be able to develop reliable molecular criteria for classifying lung cancers and predicting biological properties of individual tumors. Achieving these goals will require collaboration by scientists with specialized expertise in medicine, molecular biology, and decision-based statistical analysis

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Unsupervised Multi-Omic Data Fusion: the Neural Graph Learning Network

Author: A Chu
AK Jain
B Wang
G Cirrincione
K Chaudhary
K Tomczak
MA Jensen
MI Love
N Altman
N Rappoport
S Anders
S Gao
T Hubbard
TR Golub
W Huber
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In recent years, due to the high availability of omic data, data-driven biology has greatly expanded. However, the analysis of different data sources is still an open challenge. A few multi-omics approaches have been proposed in the literature, none of which takes into consideration the intrinsic topology of each omic, though. In this work, an unsupervised learning method based on a deep neural network is proposed. Foreach omic, a separate network is trained, whose outputs are fused into a single graph; at this purpose, an innovative loss function has been designed to better represent the data cluster manifolds. The graph adjacency matrix is exploited to determine similarities among samples. With this approach, omics having a different number of features are merged into a unique representation. Quantitative and qualitative analyses show that the proposed method has comparable results to the state of the art. The method has great intrinsic flexibility as it can be customized according to the complexity of the tasks and it has a lot of room for future improvements compared to more fine-tuned methods, opening the way for future research

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Structural similarity assessment for drug sensitivity prediction in cancer

Author: A Monks
DS Gilmour
DT Ross
E Sayers
J Khan
J Perret
JE Staunton
JK Lee
LM Shi
LM Shi
LN Harris
Michael Krauthammer
Pavithra Shivakumar
SJ Swamidass
T Sørlie
TR Golub
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The ability to predict drug sensitivity in cancer is one of the exciting promises of pharmacogenomic research. Several groups have demonstrated the ability to predict drug sensitivity by integrating chemo-sensitivity data and associated gene expression measurements from large anti-cancer drug screens such as NCI-60. The general approach is based on comparing gene expression measurements from sensitive and resistant cancer cell lines and deriving drug sensitivity profiles consisting of lists of genes whose expression is predictive of response to a drug. Importantly, it has been shown that such profiles are generic and can be applied to cancer cell lines that are not part of the anti-cancer screen. However, one limitation is that the profiles can not be generated for untested drugs (i.e., drugs that are not part of an anti-cancer drug screen). In this work, we propose using an existing drug sensitivity profile for drug A as a substitute for an untested drug B given high structural similarities between drugs A and B. Results We first show that structural similarity between pairs of compounds in the NCI-60 dataset highly correlates with the similarity between their activities across the cancer cell lines. This result shows that structurally similar drugs can be expected to have a similar effect on cancer cell lines. We next set out to test our hypothesis that we can use existing drug sensitivity profiles as substitute profiles for untested drugs. In a cross-validation experiment, we found that the use of substitute profiles is possible without a significant loss of prediction accuracy if the substitute profile was generated from a compound with high structural similarity to the untested compound. Conclusion Anti-cancer drug screens are a valuable resource for generating omics-based drug sensitivity profiles. We show that it is possible to extend the usefulness of existing screens to untested drugs by deriving substitute sensitivity profiles from structurally similar drugs part of the screen.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data

Author: B Efron
C Robert
CL Mallows
DJC MacKay
EI George
G Schwarz
H Akaike
I Guyon
I Hedenfalk
J Khan
J Zhu
J Zhu
JT Kwok
KE Lee
Leo Wang-Kit Cheung
M Dettling
M Yuan
N Cristianini
P Tamayo
R Tibshirani
RM Neal
S Barnett
S Ramaswamy
TR Golub
TR Golub
TV Gestel
U Alon
X Zhou
X Zhou
X Zhou
X Zhou
Xin Zhao
Y Lin
Y Lin
Publication venue: BioMed Central
Publication date: 01/02/2007
Field of study

BACKGROUND: Designing appropriate machine learning methods for identifying genes that have a significant discriminating power for disease outcomes has become more and more important for our understanding of diseases at genomic level. Although many machine learning methods have been developed and applied to the area of microarray gene expression data analysis, the majority of them are based on linear models, which however are not necessarily appropriate for the underlying connection between the target disease and its associated explanatory genes. Linear model based methods usually also bring in false positive significant features more easily. Furthermore, linear model based algorithms often involve calculating the inverse of a matrix that is possibly singular when the number of potentially important genes is relatively large. This leads to problems of numerical instability. To overcome these limitations, a few non-linear methods have recently been introduced to the area. Many of the existing non-linear methods have a couple of critical problems, the model selection problem and the model parameter tuning problem, that remain unsolved or even untouched. In general, a unified framework that allows model parameters of both linear and non-linear models to be easily tuned is always preferred in real-world applications. Kernel-induced learning methods form a class of approaches that show promising potentials to achieve this goal. RESULTS: A hierarchical statistical model named kernel-imbedded Gaussian process (KIGP) is developed under a unified Bayesian framework for binary disease classification problems using microarray gene expression data. In particular, based on a probit regression setting, an adaptive algorithm with a cascading structure is designed to find the appropriate kernel, to discover the potentially significant genes, and to make the optimal class prediction accordingly. A Gibbs sampler is built as the core of the algorithm to make Bayesian inferences. Simulation studies showed that, even without any knowledge of the underlying generative model, the KIGP performed very close to the theoretical Bayesian bound not only in the case with a linear Bayesian classifier but also in the case with a very non-linear Bayesian classifier. This sheds light on its broader usability to microarray data analysis problems, especially to those that linear methods work awkwardly. The KIGP was also applied to four published microarray datasets, and the results showed that the KIGP performed better than or at least as well as any of the referred state-of-the-art methods did in all of these cases. CONCLUSION: Mathematically built on the kernel-induced feature space concept under a Bayesian framework, the KIGP method presented in this paper provides a unified machine learning approach to explore both the linear and the possibly non-linear underlying relationship between the target features of a given binary disease classification problem and the related explanatory gene expression data. More importantly, it incorporates the model parameter tuning into the framework. The model selection problem is addressed in the form of selecting a proper kernel type. The KIGP method also gives Bayesian probabilistic predictions for disease classification. These properties and features are beneficial to most real-world applications. The algorithm is naturally robust in numerical computation. The simulation studies and the published data studies demonstrated that the proposed KIGP performs satisfactorily and consistently

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SMART: Unique splitting-while-merging framework for gene clustering

Author: A Thalamuthu
AD Lanterman
AE Teschendorff
AK Jain
Asoke K. Nandi
B Abu-Jamous
B Fritzke
B Fritzke
CR Lin
CS Wallace
D Dembele
D Jiang
David J. Roberts
G Celeux
H Akaike
J Qin
J Rissanen
KY Yeung
L Hubert
L Mavridis
L Zhao
MAT Figueiredo
P Tamayo
PT Spellman
R Xu
R Xu
RJ Cho
Rui Fa
S Bandyopadhyay
S Monti
S Wu
Sergio Gómez
T Kohonen
T Pramila
TR Golub
WM Rand
YJ Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/04/2014
Field of study

Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Brunel University Research Archive

Identification of disease-causing genes using microarray data mining and gene ontology

Author: A Mohammadi
A Zhang
AA Alizadeh
Azadeh Mohammadi
B Duval
BF Souza
C Ambroise
C Ding
C Tago
D Lin
D Singh
E Martinez
FM Couto
I Guyon
I Inza
J Jaeger
JJ Jiang
L Li
L Yu
L Ziaei
Mansoor Salehi
Mohammad H Saraee
N Cristianini
P Pavlidis
P Resnik
PA Mundra
PA Mundra
PJ Park
R Genuer
RF Weaver
S Li
S Li
TM Huang
TR Golub
TS Furey
U Alon
W Xu
Y Ding
Y Saeys
Y Wang
YL Chin
Z Xie
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Background: One of the best and most accurate methods for identifying disease-causing genes is monitoring gene expression values in different samples using microarray technology. One of the shortcomings of microarray data is that they provide a small quantity of samples with respect to the number of genes. This problem reduces the classification accuracy of the methods, so gene selection is essential to improve the predictive accuracy and to identify potential marker genes for a disease. Among numerous existing methods for gene selection, support vector machine-based recursive feature elimination (SVMRFE) has become one of the leading methods, but its performance can be reduced because of the small sample size, noisy data and the fact that the method does not remove redundant genes. Methods: We propose a novel framework for gene selection which uses the advantageous features of conventional methods and addresses their weaknesses. In fact, we have combined the Fisher method and SVMRFE to utilize the advantages of a filtering method as well as an embedded method. Furthermore, we have added a redundancy reduction stage to address the weakness of the Fisher method and SVMRFE. In addition to gene expression values, the proposed method uses Gene Ontology which is a reliable source of information on genes. The use of Gene Ontology can compensate, in part, for the limitations of microarrays, such as having a small number of samples and erroneous measurement results. Results: The proposed method has been applied to colon, Diffuse Large B-Cell Lymphoma (DLBCL) and prostate cancer datasets. The empirical results show that our method has improved classification performance in terms of accuracy, sensitivity and specificity. In addition, the study of the molecular function of selected genes strengthened the hypothesis that these genes are involved in the process of cancer growth. Conclusions: The proposed method addresses the weakness of conventional methods by adding a redundancy reduction stage and utilizing Gene Ontology information. It predicts marker genes for colon, DLBCL and prostate cancer with a high accuracy. The predictions made in this study can serve as a list of candidates for subsequent wet-lab verification and might help in the search for a cure for cancers

University of Salford Institutional Repository

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Expression profiling to predict outcome in breast cancer: the influence of sample selection

Author: AA Alizadeh
Carsten Peterson
CM Perou
J Khan
J Khan
LJ van't Veer
M Bittner
M West
Markus Ringnér
Mårten Fernö
Patrik Edén
Paul S Meltzer
S Gruvberger
Sofia K Gruvberger
SV Allander
T Sorlie
TR Golub
Åke Borg
Publication venue: BioMed Central
Publication date: 11/10/2002
Field of study

Gene expression profiling of tumors using DNA microarrays is a promising method for predicting prognosis and treatment response in cancer patients. It was recently reported that expression profiles of sporadic breast cancers could be used to predict disease recurrence better than currently available clinical and histopathological prognostic factors. Having observed an overlap in those data between the genes that predict outcome and those that predict estrogen receptor-α status, we examined their predictive power in an independent data set. We conclude that it may be important to define prognostic expression profiles separately for estrogen receptor-α-positive and estrogen receptor-α-negative tumors

Lund University Publications

Crossref

PubMed Central

Delineation of prognostic biomarkers in prostate cancer

Author: A Tsuji
AA Alizadeh
C Abate-Shen
CM Perou
CR Pound
DF Gleason
E Ruijter
EE Perrone
J Elek
J Kononen
K Tomita
L Liotta
M Bittner
MA Rubin
MB Eisen
MJ Barry
MR Emmert-Buck
MS Shurbaji
R Buttyan
TR Golub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/08/2001
Field of study

Prostate cancer is the most frequently diagnosed cancer in American men(1,2). Screening for prostate-specific antigen (PSA) has led to earlier detection of prostate cancer(3), but elevated serum PSA levels may be present in non-malignant conditions such as benign prostatic hyperlasia (BPH). Characterization of gene-expression profiles that molecularly distinguish prostatic neoplasms may identify genes involved in prostate carcinogenesis, elucidate clinical biomarkers, and lead to an improved classification of prostate cancer(4-6). Using microarrays of complementary DNA, we examined gene-expression profiles of more than 50 normal and neoplastic prostate specimens and three common prostate-cancer cell lines. Signature expression profiles of normal adjacent prostate (NAP), BPH, localized prostate cancer, and metastatic, hormone-refractory prostate cancer were determined. Here we establish many associations between genes and prostate cancer. We assessed two of these genes-hepsin, a transmembrane serine protease, and pim-1, a serine/threonine kinase-at the protein level using tissue microarrays consisting of over 700 clinically stratified prostate-cancer specimens. Expression of hepsin and pim-1 proteins was significantly correlated with measures of clinical outcome. Thus, the integration of cDNA microarray, high-density tissue microarray, and linked clinical and pathology data is a powerful approach to molecular profiling of human cancer.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62849/1/412822a0.pd

Crossref

Deep Blue Documents at the University of Michigan